asg method
On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings
Assran, Mahmoud, Rabbat, Michael
Methods incorporating momentum and acceleration play an important role in the current practice of machine learning Sutskever et al. (2013); Bottou et al. (2018), where they are commonly used in conjunction with stochastic gradients. However, the theoretical understanding of accelerated methods remains limited when used with stochastic gradients. This paper studies the accelerated gradient (ag) method of Nesterov (1983). Given an initial point x 0, and with x 1 x 0, the ag method repeats, for k 0, y k 1 x k β(x k x k 1) (2) x k 1 y k 1 αg k 1, (3) where α and β are the step-size and momentum parameters, 1 respectively, and in the deterministic setting, g k 1 f(y k 1). When the momentum parameter β is 0, ag simplifies to standard gradient descent (gd). When β 0 it is possible to achieve accelerated rates of convergence for certain combinations of α and β in the deterministic setting.